Visual word proximity and linguistics for semantic video indexing and near-duplicate retrieval

نویسندگان

  • Yu-Gang Jiang
  • Chong-Wah Ngo
چکیده

Please cite this article in press as: Y.-G. Jia Vis. Image Understand. (2008), doi:10.101 Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such that the ground distance measure considers the linguistic similarity of words. In particular, a BoW ontology which hierarchically specifies the hyponym relationship of words is constructed to assist the reasoning. We demonstrate soft-weighting and CEMD on two tasks: video semantic indexing and near-duplicate keyframe retrieval. Experimental results indicate that soft-weighting is superior to other popular weighting schemes such as term frequency (TF) weighting in large-scale video database. In addition, CEMD shows excellent performance compared to cosine similarity in near-duplicate retrieval. 2008 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Semantics of the Word Istikbar (Arrogance) in the Holy Quran based on Syntagmatic Relations(A Case Study of Semantic Proximity and Semantic Contrast)

The word istikbar (arrogance) is one of the key words in the monotheistic system of the Quran, which has found a special status as a special feature of the opponents and adversaries of the call to the truth. Given the prominent role of this issue in the human life system and its provision of corruption and moral deviations, it is necessary to represent the nature of the elements that make up th...

متن کامل

Assessing Semantic Relevance by Using Audiovisual Cues

This paper presents two complementary approaches for assessing semantic relevance in video retrieval—(1) adaptive video indexing and (2) elemental concept indexing. Both approaches make extensive use of audiovisual cues. In the former, retrieval is performed by using implicit semantic indices through audio and visual features. Audio features are extracted by statistical time-frequency analysis ...

متن کامل

A Framework Integrating Signal / Semantic Visual Characterizations for Conceptual Video Retrieval

Most of the video indexing and retrieval systems suffer from the lack of a comprehensive video model capturing the image semantic richness, the conveyed signal information and the spatial relations between visual entities. To remedy such shortcomings, we present in this paper a video model integrating visual semantics, spatial and signal characterizations. It relies on an expressive representat...

متن کامل

MPEG-7 audio-visual indexing test-bed for video retrieval

This paper reports on the development status of a Multimedia Asset Management (MAM) test-bed for content-based indexing and retrieval of audio-visual documents within the MPEG-7 standard. The project, called "MPEG-7 AudioVisual Document Indexing System" (MADIS), specifically targets the indexing and retrieval of video shots and key frames from documentary film archives, based on audio-visual co...

متن کامل

Semantic Duplicate Identification with Parsing and Machine Learning

Identifying duplicate texts is important in many areas like plagiarism detection, information retrieval, text summarization, and question answering. Current approaches are mostly surface-oriented (or use only shallow syntactic representations) and see each text only as a token list. In this work however, we describe a deep, semantically oriented method based on semantic networks which are deriv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Computer Vision and Image Understanding

دوره 113  شماره 

صفحات  -

تاریخ انتشار 2009